Modeling Pronunciation Variation in Conversational Speech Using Prosody
نویسندگان
چکیده
A significant source of variation in spontaneous speech is due to intra-speaker pronunciation changes. Previous work in automatic speech recognition has identified several factors that affect pronunciation variability such as phonetic context and speaking rate, as well as syntactic structure. This work examines prosody as a cue to pronunciation variability, as represented by attributes derived from F0, energy and duration values. Analyses of hand-labeled data are used to determine useful instances of prosodic variables for characterizing pronunciation changes, which in turn are used in a decision-tree-based dynamic pronunciation model. Experiments predicting phone changes show an improvement over chance when prosodic attributes are used. Including prosodic variables in a model using phonetic context and word-based information shows a 14% reduction in entropy and a slight improvement in phone error rate over the baseline model.
منابع مشابه
Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...
متن کاملModeling Pronunciation Variation in Conversational Speech using Syntax and Discourse
A significant source of variation in spontaneous speech is due to intra-speaker pronunciation changes. Previous work in automatic speech recognition has identified several factors that affect pronunciation variability such as phonetic context and speaking rate. This work examines new higher level information sources: syntax and discourse structure, specifically the relationship between these fa...
متن کاملFlexible Parameter Tying for Conversational Speech Recognition
Modeling pronunciation variation is key for recognizing conversational speech. Previous efforts on pronunciation modeling by modifying dictionaries only yielded marginal improvement. Due to complex interaction between dictionaries and acoustic models, we believe a pronunciation modeling scheme is plausible only when closely coupled with the underlying acoustic model. This paper explores the use...
متن کاملModeling pronunciation variation using artificial neural networks for English spontaneous speech
Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieve...
متن کاملPronunciation Modeling for Large Vocabulary Speech Recognition by Arthur
The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy for automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Others model the pronunciati...
متن کامل